66 research outputs found

    Remedying Sound Source Separation via Azimuth Discrimination and Re-synthesis

    Get PDF
    Commercially recorded music since the 1950s has been mixed down from many input sound sources to a two-channel reproduction of these sources. The effect of this approach is to assign sources to locations in a stereo field using a pan-position for each source. The Adress algorithm is a popular way of extracting individual music sound sources from a stereo mixture. A drawback of the Adress algorithm is that when time-frequency components in the stereo mixture are shared between two or more sources, calculating the inter-aural intensity scaling parameter for each source for that time-frequency component is challenging. We show how to obtain a good quality inverse of the pan-mixing process in the time-frequency components which are shared between different sources using a new method called Redress. We demonstrate that we can estimate how much of each source is active in time-frequency components which are shared between sources for two and three-source music mixtures. The consequence of this is that audible artefacts are not as prominent in the source estimates

    APHONIC: Adaptive thresholding for noise cancellation in smart mobile environments

    Get PDF
    We propose a signal-channel, adaptive threshold selection technique for binary mask construction, namely APHONIC, (AdaPtive tHreshOlding for NoIse Cancellation) for smart mobile environments. Using this mask, we introduce two noise cancellation techniques that perform robustly in the presence of real-world interfering signals that are typically encountered by mobile users: a violin busker, a subway and busy city square sounds. We demonstrate that when the power of the time-frequency components of the voice of a mobile user does not significantly overlap with the components of the interference signal, the threshold learning and noise cancellation techniques significantly improve the Signal-to-Interference Ratio (SIR) and the Signal-Distortion Ratio (SDR) of the recovered voice. When a mobile user\u27s speech is mixed with music or with the sounds of a city square, or subway station, the speech energy is captured by a few large magnitude coefficients and APHONIC improves the SIR by greater than 20dB and the SDR by up to 5dB. The robustness of the threshold selection step and the noise cancellation algorithms is evaluated using environments typically experienced by mobile phone users. Listening tests indicate that the interference signal is no longer audible in the denoised signals. We outline how this approach could be used in many mobile voice-driven applications

    Remedying Sound Source Separation via Azimuth Discrimination and Re-synthesis

    Get PDF
    Commercially recorded music since the 1950s has been mixed down from many input sound sources to a two- channel reproduction of these sources. The effect of this approach is to assign sources to locations in a stereo field using a pan- position for each source. The Adress algorithm is a popular way of extracting individual music sound sources from a stereo mixture. A drawback of the Adress algorithm is that when time- frequency components in the stereo mixture are shared between two or more sources, calculating the inter-aural intensity scaling parameter for each source for that time-frequency component is challenging. We show how to obtain a good quality inverse of the pan-mixing process in the time-frequency components which are shared between different sources using a new method called Redress. We demonstrate that we can estimate how much of each source is active in time-frequency components which are shared between sources for two and three-source music mixtures. The consequence of this is that audible artefacts are not as prominent in the source estimates

    Reformulating the Binary Masking Approach ofAdress as Soft Masking

    Get PDF
    Binary masking forms the basis for a number of source separation approaches that havebeen successfully applied to the problem of de-mixing music sources from a stereo recording.A well-known problem with binary masking is that, when music sources overlap in the time-frequencydomain, only one of the overlapping sources can be assigned the energy in a particulartime-frequency bin. To overcome this problem, we reformulate the classical pan-pot source separationproblem for music sources as a non-negative quadratic program. This reformulation gives rise toan algorithm, called Redress, which extends the popular Adress algorithm. It works by definingan azimuth trajectory for each source based on its spatial position within the stereo field. Redressallows for the allocation of energy in one time-frequency bin to multiple sources. We present resultsthat show that for music recordings Redress improves the SNR, SAR, and SDR in comparison to theAdress algorithm

    Tiled Time Delay Estimation in Mobile Cloud Computing Environments

    Get PDF
    We present a tiled delay estimation technique in the context of Mobile Cloud Computing (MCC) environments. We examine its accuracy in the presence of multiple sources for (1) sub-sample delays and also (2) in the presence of phase-wrap around. Phase wrap-around is prevalent in MCC because the separation of acoustic sources may be large. We show that tiling a histogram of instantaneous phase estimates can improve delay estimates when phase-wrap around is sig- nificantly present and also when multiple sources are present. We report that error in the delay estimator is generally less than 5% of a sample, when the true delay is up to 10 samples for three source mixtures

    APHONIC: Adaptive Thresholding for Noise Cancellation in Smart Mobile Environments

    Get PDF
    We propose a signal-channel, adaptive threshold selection technique for binary mask construction, namely APHONIC, (AdaPtive tHreshOlding for NoIse Cancellation) for smart mobile environments. Using this mask, we introduce two noise cancellation techniques that perform robustly in the presence of real-world interfering signals that are typically encountered by mobile users: a violin busker, a subway and busy city square sounds. We demonstrate that when the power of the time-frequency components of the voice of a mobile user does not significantly overlap with the components of the interference signal, the threshold learning and noise cancellation techniques significantly improve the Signal-to-Interference Ratio (SIR) and the Signal-Distortion Ratio (SDR) of the recovered voice. When a mobile user\u27s speech is mixed with music or with the sounds of a city square, or subway station, the speech energy is captured by a few large magnitude coefficients and APHONIC improves the SIR by greater than 20dB and the SDR by up to 5dB. The robustness of the threshold selection step and the noise cancellation algorithms is evaluated using environments typically experienced by mobile phone users. Listening tests indicate that the interference signal is no longer audible in the denoised signals. We outline how this approach could be used in many mobile voice-driven applications

    Power-Weighted LPC Formant Estimation

    Get PDF
    A power-weighted formant frequency estimation procedure based on Linear Predictive Coding (LPC) is presented. It works by pre-emphasizing the dominant spectral components of an input signal, which allows a subsequent estimation step to extract formant frequencies with greater accuracy. The accuracy of traditional LPC formant estimation is improved by this new power-weighted formant estimator for different classes of synthetic signals and for speech. Power-weighted LPC significantly and reliably outperforms LPC and variants of LPC at the task of formant estimation using the VTR formants dataset, a database consisting of the Vocal Tract Resonance (VTR) frequency trajectories obtained by human experts for the first three formant frequencies. This performance gain is evident over a range of filter orders

    Effect of System Load on Video Service Metrics

    Get PDF
    Model selection, in order to learn the mapping between the kernel metrics of a machine in a server cluster and a service quality metric on a client\u27s machine, has been addressed by directly applying Linear Regression (LR) to the observations. The popularity of the LR approach is due to: 1) its implementation efficiency; 2) its low computational complexity; and finally, 3) it generally captures the data relatively accurately. LR, can however, produce misleading results if the LR model does not characterize the system: this deception is due in part to its accuracy. In the client-server service modeling literature LR is applied to the server and client metrics without treating the load on the system as the cause for the excitation of the system. By contrast, in this paper, we propose a generative model for the server and client metrics and a hierarchical model to explain the mapping between them, which is cognizant of the effects of the load on the system. Evaluations using real traces support the following conclusions: The system load accounts for ≥ 50% of the energy of a high proportion of the client and server metric traces -modeling the load is crucial; the load signal is localized in the frequency domain: we can remove the load by deconvolution; There is a significant phase shift between both the kernel and the service-level metrics, which, coupled with the load, heavily biases the results obtained from out-of-the-box LR without any system identification pre-processing

    A Proactive-Restoration Technique for SDNs

    Get PDF
    Failure incidents result in temporarily preventing the network from delivering services properly. Such a deterioration in services called service unavailability. The traditional fault management techniques, i.e. protection and restoration, are inevitably concerned with service unavailability due to the convergence time that is required to achieve the recovery when a failure occurs. However, with the global view feature of software-defined networking a failure prediction is becoming attainable, which in turn reduces the service interruptions that originated by failures. In this paper, we propose a proactive restoration technique that reconfigure the vulnerable routes which are likely to be affected if the predicted failure indeed occurs. The proposed approach allocates the alternative routes based on the probability of failure. Experimental evaluation on real-world and synthetic topologies demonstrates that the network service availability can be improved with the proposed technique to reach up to 97%. Based on the obtained results, further directions are suggested towards achieving further advances in this research area

    Poster: Acoustic Source Localization Using Straight Line Approximations

    Get PDF
    The short paper extends an acoustic signal delay estimation method to general anechoic scenario using image processing techniques. The technique proposed in this paper localizes acoustic speech sources by creating a matrix of phase versus frequency histograms, where the same phases are stacked in appropriate bins. With larger delays and multiple sources coexisting in the same matrix, it becomes cluttered with activated bins. This results in high intensity spots on the spectrogram, making source discrimination difficult. In this paper, we have employed morphological filtering, chain-coding and straight line approximations to ignore noise and enhance the target signal features. Lastly, Hough transform is used for the source localization. The resulting estimates are accurate and invariant to the sampling-rate and shall have application in acoustic source separation
    • …
    corecore